---
{
    "title": "HLL",
    "language": "en"
}
---

<!-- 
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

  http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied.  See the License for the
specific language governing permissions and limitations
under the License.
-->

# HLL
## Description
HLL is an engineering implementation based on the HyperLogLog algorithm. It is used to store the intermediate results of the HyperLog calculation process. It can only be used as the value column type of the table.
By aggregating to reduce the amount of data continuously, in order to achieve the purpose of speeding up the query, based on which an estimated result, the error is about 1%.
The HLL column is generated by other columns or data in the imported data. When imported, the hll_hash function is used to specify which column in the data is used to generate the HLL column.
It is often used to replace count distinct, and to quickly calculate UV in business by combining rollup.

The correlation function:

TOTAL UNION
This function is an aggregation function, which is used to calculate the cardinality estimation of all data satisfying the conditions. This function can also be used to analyze functions. It only supports the default window and does not support the window clause.

Coach L.u RAW AGG
This function is an aggregation function that aggregates HLL type fields and returns HLL type.

HLL_CARDINALITY(hll)
This function is used to estimate the cardinality of a single HLL sequence

HLL_HASH(column_name)
Generate HLL column types for insert or import, see the instructions for the use of imports

EMPTY_HLL()
Generate empty HLL column types for insert or import, see the instructions for the use of imports

## example
1. First create a table with HLL columns
create table test(
dt date,
id int,
name char(10),
Province of char (10),
The char (1),
the European Union,
European Union
distributed by hash(id) buckets 32;

2. Import data. See help curl for the way you import it.

    A. Generate HLL columns using columns in tables

        curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, os, set1=hll_hash(id), set2=hll_hash(name)"
            http://host/api/test_db/test/_stream_load

    B. Generate HLL columns using a column in the data

        curl --location-trusted -uname:password -T data -H "label:load_1" -H "columns:dt, id, name, province, sex, cuid, os, set1=hll_hash(cuid), set2=hll_hash(os)"
            http://host/api/test_db/test/_stream_load

3. There are three common ways of aggregating data: (without aggregating the base table directly, the speed may be similar to that of using APPROX_COUNT_DISTINCT directly)

A. Create a rollup that allows HLL columns to generate aggregation.
alter table test add rollup test_rollup(dt, set1);

B. Create another table dedicated to computing uv, and insert data)

create table test_uv(
dt date,
uv_set hll hll_union)
distributed by hash(id) buckets 32;

insert into test_uv select dt, set1 from test;

C. Create another table dedicated to computing uv, then insert and generate HLL columns from other non-hll columns of test through hll_hash

create table test_uv(
dt date,
id_set hll hll_union)
distributed by hash(id) buckets 32;

insert into test_uv select dt, hll_hash(id) from test;

4. Query, HLL column is not allowed to query its original value directly, it can be queried by matching functions.

a. 27714; 24635; uv
select HLL_UNION_AGG(uv_set) from test_uv;

B. Seek every day's UV
select dt, HLL_CARDINALITY(uv_set) from test_uv;

C. Find the aggregate value of Set1 in the test table
select dt, HLL_CARDINALITY(uv) from (select dt, HLL_RAW_AGG(set1) as uv from test group by dt) tmp;
select dt, HLL_UNION_AGG(set1) as uv from test group by dt;

## keyword
HLL
